15 research outputs found

    An EM Algorithm for Convolutive Independent Component Analysis

    No full text
    In this paper, we address the problem of blind separation of convolutive mixtures of spatially and temporally independent sources modeled with mixtures of Gaussians. We present an EM algorithm to compute Maximum Likelihood estimates of both the separating filters and the source density parameters, whereas in the state-of-the-art separating filters are usually estimated with gradient descent techniques

    Inference of Variable-Length Linguistic and Acoustic Units By Multigrams

    No full text
    The efficiency of pattern recognition algorithms is highly conditioned to a proper definition of the patterns assumed to structure the data. The multigram model provides a statistical tool to retrieve sequential variable-length regularities within streams of data. In this paper, we present a general formulation of the model, applicable to single or multiple parallel strings of data having either discrete or continuous values. The model is first assessed to derive an inventory of variable-length sequences of letters from text data, where all spaces between the words have been removed. It turns out that the sequences of letters inferred during this fully unsupervised procedure clearly relate to the morphological structure of the text. The model is then used to infer a set of variable-length acoustic units, directly from speech data. Speech files containing examples of acoustic units are provided along with this paper in order to illustrate their consistency from an auditory point of view..

    Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model

    Get PDF
    El presente estudio, se dise帽贸 para examinar la gesti贸n educativa y para definir la relaci贸n con el nivel de conocimiento en la comunicaci贸n institucional y determinar la calidad de formaci贸n de directores y su repercusi贸n en la gesti贸n educativa de los centros de educaci贸n t茅cnico-productiva de la ciudad de Juliaca, 2016. Materia y m茅todos: Se obtuvo la muestra de 18 directores. Resultados: La gesti贸n institucional con niveles regulares en un 62%, previstos 25.3% y en proceso 48.1%, en la gesti贸n administrativa con niveles regulares en un 65%, y en la gesti贸n pedag贸gica de igual manera con niveles regulares en un 65% en respecto al nivel de conocimiento de la comunicaci贸n institucional en los directores de las instituciones educativas de los centros de educaci贸n t茅cnico-productiva de la ciudad de Juliaca, 2016. Conclusiones: La gesti贸n educativa se relaciona directamente con el nivel de conocimiento en la comunicaci贸n obteniendo r = 0.485 esto nos indica que existe una relaci贸n baja o d茅bil seg煤n la tabla de correlaci贸n de Pearson, en los directores de educaci贸n t茅cnico-productiva de la ciudad de Juliaca, 2016Tesi

    data with a bi-multigram

    No full text
    a syntagmatic and paradigmatic structure from languag

    Language Modeling By Variable Length Sequences : Theoretical Formulation And Evaluation Of Multigrams

    No full text
    The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a Maximum Likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative Expectation-Maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task. 1. INTRODUCTION Language can be viewed as a stream of words put out by a source. This source being subject to syntactic and semantic constraints, words are not independent, but the dependencies are of variable length. One can therefore expect to retrieve, in a corpus of text, typical variable-..

    Automatic Generation And Selection Of Multiple Pronunciations For Dynamic Vocabularies

    No full text
    In this paper, we present a new scheme for the acoustic modeling of speech recognition applications requiring dynamic vocabularies. It applies especially to the acoustic modeling of out-of-vocabulary words which need to be added to a recognition lexicon based on the observation of a few (say one or two) speech utterances of these words. Standard approaches to this problem derive a single pronunciation from each speech utterance by combining acoustic and phone transition scores. In our scheme, multiple pronunciations are generated from each speech utterance of a word to enroll by varying the relative weights assigned to the acoustic and phone transition models. In our experiments, the use of these multiple baseforms dramatically outperforms the standard approach with a relative decrease of the word error rate ranging from 20% to 40% on all our test sets. 1. MOTIVATION Speech recognition systems usually rely on a fixed lexicon where the pronunciations of the vocabulary words are given ..

    Introducing Statistical Dependencies and Structural Constraints in Variable-Length Sequence Models

    No full text
    this paper, our goal is therefore twofold: -- to demonstrate theoretically the possibility to relax this important assumption of the original multigram model

    Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization

    No full text
    Although current automatic speech recognition (ASR) systems perform remarkably well for a variety of recognition tasks in clean audio conditions, their accuracy degrades with increasing levels of environment noise. New approaches are needed to handle the ASR lack of robustness to noise. In this paper, we propose a multi-sensor approach to ASR, where visual information, in addition to the standard audio information, is obtained from the speaker鈥檚 face in a second channel. Audio-visual ASR, where both an audio channel and a visual channel are input to the recognition system, has already been demonstrated to outperform traditional audioonly ASR in noise conditions [5] [6]. In addition to audiovisual ASR, the visual modality has been investigated as a means of enhancement, where clean audio features are estimated from audio-visual speech when the audio channel is corrupted by noise [3] [4]. However, in [4] for example, the ASR performance of linear audio-visual enhancement (where clean audio features are estimated via linear filtering of the noisy audio-visual features) remains significantly inferior to the performance of audio-visual ASR. In this paper, we introduce a non-linear enhancement technique called Audio-Visual Codebook Dependent Cepstral Normalization (AVCDCN) and we consider its use with both audioonly ASR and audio-visual ASR. AVCDCN is inspired fro
    corecore